Which integrations are available for this server?

Utilizes the Google Gemini API to perform Optical Character Recognition (OCR), enabling the extraction of text from images provided via local file paths or base64 encoded strings.

How do I use Gemini OCR MCP Server?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Gemini OCR MCP Server extract the text from the image file at C:/Users/Admin/Desktop/invoice.png" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Gemini OCR MCP Server

This project provides a simple yet powerful OCR (Optical Character Recognition) service through a FastMCP server, leveraging the capabilities of the Google Gemini API. It allows you to extract text from images either by providing a file path or a base64 encoded string.

Objective

Extract the text from the following image:

CAPTCHA

and convert it to plain text, e.g., fbVk

Features

File-based OCR: Extract text directly from an image file on your local system.
Base64 OCR: Extract text from a base64 encoded image string.
Easy to Use: Exposes OCR functionality as simple tools in an MCP server.
Powered by Gemini: Utilizes Google's advanced Gemini models for high-accuracy text recognition.

Prerequisites

Python 3.8 or higher
A Google Gemini API Key. You can obtain one from Google AI Studio.

Setup and Installation

Clone the repository:
git clone https://github.com/WindoC/gemini-ocr-mcp cd gemini-ocr-mcp
Create and activate a virtual environment:
# Install uv standalone if needed ## On macOS and Linux. curl -LsSf https://astral.sh/uv/install.sh | sh ## On Windows. powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
Install the required dependencies:
uv sync

MCP Configuration Example

If you are running this as a server for a parent MCP application, you can configure it in your main MCP config.json.

Windows Example:

{ "mcpServers": { "gemini-ocr-mcp": { "command": "uv", "args": [ "--directory", "x:\\path\\to\\your\\project\\gemini-ocr-mcp", "run", "gemini-ocr-mcp.py" ], "env": { "GEMINI_MODEL": "gemini-2.5-flash-preview-05-20", "GEMINI_API_KEY": "YOUR_GEMINI_API_KEY" } } } }

Linux/macOS Example:

{ "mcpServers": { "gemini-ocr-mcp": { "command": "uv", "args": [ "--directory", "/path/to/your/project/gemini-ocr-mcp", "run", "gemini-ocr-mcp.py" ], "env": { "GEMINI_MODEL": "gemini-2.5-flash-preview-05-20", "GEMINI_API_KEY": "YOUR_GEMINI_API_KEY" } } } }

Note: Remember to replace the placeholder paths with the absolute path to your project directory.